-
Notifications
You must be signed in to change notification settings - Fork 292
Merge feature/perf to master #5760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It is in /usr/sbin not /usr/bin Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Run xentrace in continuous tracing mode to a circular buffer (of fixed size). When a CPU spike is detected xentrace is sent a SIGTERM with `killall` to get it to dump the xentrace buffer to disk. The trace is compressed and another xentrace started. This repeats until we run out of the space on the VDI set up for tracing. CPU spikes are detected by using `rrd2csv` and watching the host `cpu_avg` which should be the average of the CPU usage of all the host's pCPUs. New flags: * -c to enable circular tracing mode (default is a -T time one-off trace) * -p <cpu-usage> e.g. -p 98 the detection point for CPU spike * -r <n> repeat count - how many high cpu usage measurements in a row before triggerring the xentrace dump * -M <n> to set the size of the circular buffer in MiB (default: 400MiB) There is also a (hardcoded for now) sleep 60 between the traces just so we don't continuously measure a very long spike. To stop circular tracing mode you have to Ctrl-C on the running xe-xentrace (e.g. under `screen`), or `killall xe-xentrace` (but this seems to prevent automatic cleanup) Signed-off-by: Edwin Török <edwin@etorok.net>
The C SDK build was failing with a recent GCC on Fedora39 like this: ``` src/xen_common.c: In function ‘xen_session_logout’: src/xen_common.c:298:5: error: ‘xen_call_’ accessing 16 bytes in a region of size 0 [-Werror=stringop-overflow=] 298 | xen_call_(session, "session.logout", params, 0, NULL, NULL); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Use `NULL` instead of a VLA of size 0. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Requires libxml2-devel installed. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This will be useful for testing record_util.ml Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…ed case Other enums here accept mixed case, not just lower case. Fixes: 638f58e ("CP-38020: add HOST.set_numa_affinity_policy") Signed-off-by: Edwin Török <edwin.torok@cloud.com>
There are some inconsistencies between enum string values and their declarations, e.g. `balance-slb` (which has special handling in the API generator), or `depth-first` (which does not). We want to automatically generate record_util.ml, but we must ensure backwards compatibility, and the only way to do that is to exhaustively test all the old values in a unit test. This test won't need to be updated when new enum values are introduced: we'll use the automatically generated ones for those already. The `old*` files were created using the following command: ``` cp ocaml/xapi-cli-server/record_util.ml ocaml/tests/record_util/old_record_util.ml && dune build @check --profile=release && grep 'let all_' _build/o\default/ocaml/xapi-types/aPI.ml >|ocaml/tests/record_util/old_enum_all.ml && dune fmt --auto-promote ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Add record_util backwards compatibility test
XAPI was just retrying endlessly in a loop saying "Could not find block device", but didn't raise any alerts that XenRT could detect. Report the redo log broken the first time we fail due to the lack of a bloc k device (which would indicate something going wrong in SM). Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Nested virt is not a supported feature yet, but when moving the setting from Xen to toolstack as part of the Xen-4.17 update I typoed the name of the setting. It was platform/nested-virt in xenguest.patch before Fixes: 664de76 ("Xen-4.15+: CDF_NESTED_VIRT") Signed-off-by: Edwin Török <edwin.torok@cloud.com>
For regular replies we look at the request and reply with a matching version. However when we fail to parse the JsonRPC request itself then we don't know what version to use. XenCenter uses JsonRPC v2 by default, and JsonRPC v2 has been supported in XAPI for a long time. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Fixes this pytype report: ``` pytype_reporter: Expected: (self, uri, transport, encoding, verbose: bool = ..., ...) pytype_reporter: Actually passed: (self, uri, transport, encoding, verbose: int, ...) pytype_reporter: Expected: (self, uri, transport, encoding, verbose, allow_none: bool = ..., ...) pytype_reporter: Actually passed: (self, uri, transport, encoding, verbose, allow_none: int) ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
pytype claims that there is no 'errno' in the socket module, but there is: ``` Python 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> socket.errno.ETIMEDOUT 110 ``` Also `session.local_logout` is a remote method that is proxied by `__getattr__`, I don't know why `pytype` complains about that one, and not about `logout`, but suppress it. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…ication This is already the case for HTTP(S) connections, but XenAPI overrides the xmlrpclib implementation to provide a Unix socket transport, and it lacks this optimization. Otherwise we've noticed that SM makes at least 4 separate connections to XAPI to query the version number for example. On a busy system each of those connections can get delayed by 100ms: * OCaml's tick thread switches threads every 50ms, and you might need 2 switches when XAPI is busy with CPU-bound workloads: * one to accept the connection * another to handle the request on the newly spawned thread See: https://github.com/python/cpython/blob/v2.7.5/Lib/xmlrpclib.py#L1361-L1371 https://github.com/python/cpython/blob/v3.6.8/Lib/xmlrpc/client.py#L1237-L1245 Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Old version of XenAPI.py used to look at the API version to implement a compatibility layer, but that code got dropped and the API version is now completely unused: df9b539 ("CA-35286: Remove forward compatability code from the python XenAPI module in favour of the compatability layer in xapi") Remove the unused argument, this will allow us to avoid making 4 additional API calls after every login. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The API version is not used anymore, so remove the 4 additional queries it'd make. For backward compatibility retain the field as a cached property: should a client of this code want to access the API version, then it'll make the 4 queries at that time, and cache the result for the duration of the API object. Cached properties are writable, so the fallback that writes the API version to 1.1 should keep working too Signed-off-by: Edwin Török <edwin.torok@cloud.com>
CP-48623: reduce XenAPI.py connection rate and drop 4 useless API calls
CA-381119: Use JsonRPC v2
CA-389506: fix platform:nested_virt typo
redo_log: report redo log as broken if we cannot find the block device
This forces us to fully declare the dependencies of our code, and not rely on libraries that are brought in only as transitive dependencies of other libraries we happen to link to. E.g. if our module A depends on library X, which itself depends on library Y, then currently by linking X we also get Y linked and accessible from A directly. If code in module A uses both module X and Y *directly* then it needs to declare a dependency on both when implicit transitive deps are off or it gets a link failure (typically an error about a module or type being abstract). If the code in module A only uses module X then no change is needed (X can still use Y and the final executable will link both, it is just a question of what is visible and callable from A directly). This is especially useful when writing new code to get dependencies correct from the beginning. Signed-off-by: Edwin Török <edwin.torok@cloud.com> (cherry picked from commit 7203430)
…check Bytecode builds for `http_lib` are disabled due to '(modes best)', and that means that anything that depends on it must have it disabled too to avoid this warning. Avoids these kinds of warnings: ``` File "_none_", line 1: Error: Module `Buf_io' is unavailable (required by `Http_svr') ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This will be a new library that will provide a more type-safe interface to file descriptor operations. Useful on its own, but also for testing stdext. Minimal dependencies, only Unix (and Alcotest for testing). Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This will be a test framework providing QCheck generators and properties for testing file descriptor operations. It will try to generate: * different kinds of file descriptors * actual data written/read on the other end of pipes and socket pairs * different speeds and delays on the other end to find buffering bugs * file descriptors that are >1024 to find bugs with select Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We are going to use type-level constraints a lot. Try to future proof it by using the recommended compiler flag. `ocamlc` says this about `-principal`: > When using labelled arguments and/or polymorphic methods, this flag is required to > ensure future versions of the compiler will be able to infer types correctly, even if internal algorithms change Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We'll check statically that we are not using Unix.select, but it is good to have a runtime check too, in case some library (C or OCaml) indirectly uses it. Default is 0 but can be set to 1024 to test for the absence of Unix.select. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…rtup failures If 'server_init' raised an exception previously we wouldn't be able to log the full stacktrace. Enable backtraces earlier to ensure that we can. We need to use Debug.with_thread_associated instead of just Backtrace.with_backtraces, because if the thread is not registered, then xapi-backtrace won't print the backtrace even if it has one. A startup failure now looks like this in the logs: ``` Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] Xapi.watchdog failed with exception Unix.Unix_error(Unix.EMFILE, "dup", "") Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] Raised Unix.Unix_error(Unix.EMFILE, "dup", "") Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 1/7 xapi Raised at file ocaml/libs/xapi-stdext/lib/xapi-stdext-unix/unixext.ml, line 873 Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 2/7 xapi Called from file ocaml/xapi/xapi.ml, line 936 Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 3/7 xapi Called from file ocaml/xapi/xapi.ml, line 946 Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 4/7 xapi Called from file ocaml/xapi/xapi.ml, line 1535 Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 5/7 xapi Called from file ocaml/xapi/xapi.ml, line 1541 Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 6/7 xapi Called from file ocaml/xapi/xapi.ml, line 1548 Jun 19 04:58:41 lcy2-dt72 xapi: [error||0 ||backtrace] 7/7 xapi Called from file ocaml/libs/log/debug.ml, line 250 ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This commit got split between the 3 epoll PRs, and the Makefile part got missed: f4baafa ("Test for absence of select: introduce open_1024 in tests") The build still worked locally because I have ulimit > 1024 (and apparently in the CI too?), but it failed in Koji where we had a 1024 limit. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Merge master into feature/perf
This code snippet was unchanged since the beginning, but pylint complained: ``` R1720: Unnecessary "else" after "raise", remove the "else" and de-indent the code inside it (no-else-raise) ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Fixes: 500a1f7 ("XenAPI: suppress pytype false positives") Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅ Additional details and impacted files@@ Coverage Diff @@
## master #5760 +/- ##
========================================
+ Coverage 44.8% 51.4% +6.5%
========================================
Files 16 13 -3
Lines 2210 1928 -282
========================================
- Hits 992 991 -1
+ Misses 1218 937 -281 see 4 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. |
I'm surprised this doesn't cause problems in xs-opam, I've tried turning them off, but the xen-built xenctrl findlib libraries have a different name from the dummy xenctrl's ( |
There is an explicit dependency on I haven't tried building this in the xs-opam CI, but it builds in |
@edwintorok the branch has a conflict, does it need another merge from master? |
yes, I can't push to the branch directly. I'll raise another PR to feature/perf to update it, and then we should really merge this branch or this'll keep happening (conflicts with other PRs being merged). |
PR here #5804 |
Update feature/perf from master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume all changes in feature/perf
have been reviewed in PRs previously, and subsequently tested extensively. So this is good to go now.
Whats new: